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Abstract: Identification of human disease genes can be accomplished by two strategies: functional cloning and positional 
cloning. Genetic mapping is the localization of genes underlying phenotypes on the basis of correlation with DNA variation, 
without the need for prior hypotheses about biological function and the simplest form, called linkage analysis. The ability to 
clone and sequence DNA made it possible to tie genetic linkage maps in model organisms to the underlying DNA sequence. In 
conclusion particular alleles at neighboring loci tend to be co-inherited. For tightly linked loci, this might lead to 
associations between alleles known as linkage disequilibrium (LD). Considerable effort and expense have been expended in 
whole-genome screens aimed at detection of genetic loci contributing to the susceptibility to complex human diseases. 
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1. Introduction 

For most of the 20th century, genome-wide linkage 
mapping was impractical in humans: Family sizes are small 
crosses are not by design, and there were too few classical 
genetic markers to systematically trace inheritance. 
Progress in identifying the genes contributing to human 
traits was initially limited to studies of biological 
candidates such as blood-type antigens [1] et al, and 
hemoglobin (3 protein in sickle-cell anemia [2-4]. 

Marker loci near a disease gene are often observed to be 
in linkage disequilibrium with the disease; that is, the 
relative frequencies of marker alleles in affected 
individuals differ from those in the general population. 
Linkage disequilibrium occurs because each new disease- 
predisposing mutation originally appears on a single 
chromosome. Individuals who inherit a disease mutation 
are likely to also inherit the alleles of the original 
chromosome, at neighboring marker loci. As generations 
pass, recombination or mutation can disrupt the joint 
transmission of disease mutation and marker allele. 
Because recombination with the disease gene happens less 
often for nearby marker loci, markers in the immediate 
vicinity of the gene should remain in greater disequilibrium 
than more distant marker loci. The potential value of 



haplotypes defined by several single nucleotide 
polymorphisms has attracted recent interest. With sufficient 
linkage disequilibrium (LD), haplotypes could be used in 
association studies to map common alleles that might 
influence the susceptibility to common diseases, as well as 
for reconstructing the evolution of the genome. It has been 
proposed that a globally useful resource need only be based 
on high frequency variants, identified from a few modest 
samples. Rapid progress has been made in quantifying the 
pattern of human LD and haplotypes defined by such 
common variants within and among populations. However, 
the quality and utility of the proposed LD-based resource 
could be seriously compromised if important sampling and 
analytical factors are overlooked in its design. The LD map 
should be based on adequately justified criteria defined by 
sound population genetic principles. Identification of genes 
that harbor variation associated with interindividual 
differences in risk of complex diseases remains one of the 
most challenging and important problems in human 
genetics. For genetic variants that are sufficiently common 
and have sufficiently large effects, direct tests of 
association through linkage disequilibrium with anonymous 
SNPs may prove effective. But the two critical 
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parameters — the frequency of risk inflating alleles and the 
magnitudes of their effect on risk — remain largely 
unknown [5, 6] 

2. Material and Methods 

2.1. Factors that Influence Linkage Disequilibrium 

Mutation (Figure 1) and recombination (Figure 2) might 
have the most evident impact on linkage disequilibrium 
(LD), but there are additional contributors to the extent and 
distribution of disequilibrium. Most of these involve 
demographic aspects of a population, and tend to sever the 
relationship between LD strength and the physical distance 
between loci 

• Genetic drift. This phenomenon describes the change in 
gene and haplotype frequency in a population every 
generation owing to the random sampling of gametes that 
occurs during the production of a finite number of offspring. 
Frequency changes are accentuated in small populations. In 
general, the increased drift of small, stable (not growing) 
populations tends to increase LD, as haplotypes are lost 
from the population. Such populations might be suitable for 
disease-gene mapping, with the idea that genetic drift will 
accentuate disease and marker allele frequency differences 
between cases and controls[7-9]. However, the applicability 
of this phenomenon to gene mapping has not been well 
characterized. 

•Population growth. Rapid population growth decreases 
LD by reducing genetic drift. 

•Admixture or migration. LD can be created by 
ADMIXTURE, or by migration (gene flow), between 
populations. 

Initially, LD is proportional to the allele frequency 
differences between the populations, and is unrelated to the 
distance between markers. In subsequent generations, the 
‘spurious’ LD between unlinked markers quickly dissipates, 
while LD between nearby markers is more slowly broken 
down by recombination. In theory, this would allow the 
mapping of disease genes in hybrid populations without 
using many genetic markers [10, 11]. Several admixed 
populations, such as African Americans and Hispanic 
Americans, have been characterized with this application in 
mind[12- 15], but the success of this approach will depend 
heavily on the time since admixture occurred, the 
frequency differences of the disease of interest in the 
parental populations and the allele frequency differences. 
So, the diseases and circumstances for which this mapping 
approach will be feasible might turn out to be quite rare and 
exceptional. 

•Population structure. Various aspects of population 
structure are thought to influence LD. Population 



subdivision is likely to have been an important factor in 
establishing the patterns of LD in humans, but most of our 
limited knowledge comes from the study of model 
organisms. An interesting recent study of Arabidopsis 
indicated that extreme inbreeding can produce high levels 
of LD without a substantial reduction in levels of variation 
[16- 18]. This neglected area would benefit from intensified 
study in humans. 

•Natural selection. There are two primary routes by 
which selection can affect the extent of disequilibrium. The 
first is a hitchhiking effect, in which an entire haplotype 
that flanks a favoured variant can be rapidly swept to high 
frequency or even fixation [19, 20]. Although the effect is 
generally milder, selection against deleterious variants can 
also inflate LD, as the deleterious haplotypes are swept 
from the population [21- 23]. The second way in which 
selection can affect LD is through epistatic selection for 
combinations of alleles at two or more loci on the same 
chromosome [24, 25]. This form of selection leads to the 
association of particular alleles at different loci. Although 
this has provided a major motivation for historical studies 
of LD in Drosophila genetics, as a means of detecting the 
action of [epistatic] natural selection [26, 27], it has not yet 
been shown to alter LD in humans. 

•Variable recombination rates. Recombination rates are 
known to vary by more than an order of magnitude across 
the genome. Because breakdown of LD is primarily driven 
by recombination; the extent of LD is expected to vary in 
inverse relation to the local recombination rate. It is even 
possible that recombination is largely confined to highly 
localized recombination hot spots, with little recombination 
elsewhere. According to this view, LD will be strong across 
the non-recombining regions and break down at hot spots. 
Although there are intriguing indications that this reflects 
the situation for some regions [28, 29], the generality of the 
hot-spot phenomenon, the strength of recombination in and 
outside hot spots, and the length distributions of these 
regions remain to be determined. 

•Variable mutation rates. Some single-nucleotide 
polymorphisms, such as those at CpG dinucleotides, might 
have high mutation rates and therefore show little or no LD 
with nearby markers, even in the absence of historical 
recombination. 

•Gene conversion. In a gene conversion event, a short 
stretch of one copy of a chromosome is transferred to the 
other copy during meiosis. The effect is equivalent to two 
very closely spaced recombination events, and can break 
down LD in a manner similar to recombination or recurrent 
mutation. It has recently been shown that rates of gene 
conversion in humans are high and are important in LD 
between very tightly linked markers [30- 32]. 
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Figure 1. Linkage disequilibrium around an ancestral mutation. The 
mutation is indicated by a red triangle. Chromosomal stretches derived 
from the common ancestor of all mutant chromosomes are shown in yellow, 
and new stretches introduced by recombination are shown in blue. 
Markers that are physically close (that is, in the yellow regions of present- 
day chromosomes) tend to remain associated with the ancestral mutation 
even as recombination limits the extent of the region of association over 
time. 




Figure 2. The erosion of linkage disequilibrium by recombination, a \ At 
the outset, there is a polymorphic locus with alleles A and a. b | When a 
mutation occurs at a nearby locus, changing an allele B to b, this occurs 
on a single chromosome bearing either allele A or a at the first locus (A in 
this example). So, early in the lifetime of the mutation, only three out of 
the four possible haplotypes will be observed in the population. The b 
allele will always be found on a chromosome with the A allele at the 
adjacent locus, c \ The association between alleles at the two loci will 
gradually be disrupted by recombination between the loci, d \ This will 
result in the creation of the fourth possible haplotype and an eventual 
decline in LD among the markers in the population as the recombinant 
chromosome (a, b) increases in frequency. 

2. 1. 1. Genetic Linkage Analysis 

Today the identification of human disease genes rarely 
relies on the function, but on their locations in the human 
genome (positional cloning or reverse genetics). An 
obstacle in early human disease gene mapping was the lack 
of genetic markers, and blood groups and serum protein 
polymorphisms were the only available genetic markers. 
These markers were technically difficult to type and they 
only covered a small part of the human genome. This was 
overcome by the introduction of DNA restriction fragment 
length polymorphisms (RFLPs) in 1980 [33- 36]. RFLPs 
are based on single base pairs alterations that involve a 
cleavage site of a restriction endonuclease. The next 
generation of DNA markers constitutes the variable number 
of tandem repeats [VNTRs] [37, 38]. These highly 



informative markers are scattered throughout the genome 
and usually reflects variations in repeat motifs of 10-20 
bases per unit. A drawback of the RFLP and VNTR 
markers is that their typing involves the relatively time 
consuming Southern blot analysis and there is a 
requirement of large amounts of DNA for the analyses. 
Detection of susceptibility genes in indirect association 
studies depends not only on the degree of linkage 
disequilibrium between the disease variant and the SNP 
marker but also on the difference in their allele frequencies. 
Little is known about how variations in these parameters 
may affect the power of indirect association studies among 
related populations [39-41]. 

Genetic linkage analysis is a statistical method that is 
used to associate functionality of genes to their location on 
chromosomes. Neighboring genes on the chromosome have 
a tendency to stick together when passed on to offsprings. 
Therefore, if some disease is often passed to offsprings 
along with specific marker-genes, then it can be concluded 
that the gene(s) which are responsible for the disease are 
located close on the chromosome to these markers. 
Accurate genetic maps are required for successful and 
efficient linkage mapping of disease genes. However, most 
available genome-wide genetic maps were built using only 
small collections of pedigrees, and therefore have large 
sampling errors. A large set of genetic studies genotyped by 
the NHLB1 Mammalian Genotyping Service (MGS) 
provide appropriate data for generating more accurate maps 
[42- 46]. Genomic screening to map disease loci by 
association requires automation, pooling of DNA samples, 
and 3,000-6,000 highly polymorphic, evenly spaced 
microsatellite markers. Case-control samples can be used in 
an initial screen, followed by family-based data to confirm 
marker associations. Association mapping is relevant to 
genetic studies of complex diseases in which linkage 
analysis maybe less effective and to cases in which multi- 
generational data are difficult to obtain, including rare or 
late-onset conditions and infectious diseases. The method 
can also be used effectively to follow up and confirm 
regions identified in linkage studies or to investigate 
candidate disease loci. Study designs can incorporate 
disease heterogeneity and interaction effects by appropriate 
subdivision of samples before screening. Scientists reported, 
use of pooled DNA amplifications — the accurate 
determination of marker-disease associations for both case- 
control and nuclear family-based data — including 

application of correction methods for stutter artifact and 
preferential amplification. They combined with a 
discussion of both statistical power and experimental 
design to define the necessary requirements for detecting of 
disease loci while virtually eliminating false positives, 
suggest the feasibility and efficiency of association 
mapping using pooled DNA screening. 

2.1.2. Mapping Human Disease Genesby Linkage 
Analysis 

The International HapMap Project was designed to 
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create a genome-wide database of patterns of human 
genetic variation, with the expectation that these patterns 
would be useful for genetic association studies of common 
diseases. This expectation has been amply fulfilled with 
just the initial output of genome-wide association studies, 
identifying nearly 100 loci for nearly 40 common diseases 
and traits. These associations provided new insights into 
pathophysiology, suggesting previously unsuspected 
etiologic pathways for common diseases that will be of use 
in identifying new therapeutic targets and developing targeted 
interventions based on genetically defined risk. In addition, 
HapMap-based discoveries have shed new light on the 
impact of evolutionary pressures on the human genome, 
suggesting multiple loci important for adapting to disease- 
causing pathogens and new environments [47, 48]. 

2.1.3. Genomewide Scans of Complex Human Diseases 

Many “complex” human diseases, which involve 
multiple genetic and environmental determinants, have 
increased in incidence during the past 2 decades. During 
the same time period, considerable effort and expense have 
been expended in whole-genome screens aimed at detection 
of genetic loci contributing to the susceptibility to complex 
human diseases. However, the success of positional cloning 
attempts based on whole-genome screens has been limited, 
and many of the fundamental questions relating to the 
genetic epidemiology of complex human disease remain 
unanswered. Multivariate analysis suggests that the only 
factors independently associated with increased study 
success are (a) an increase in the number of individuals 
studied and ( b ) study of a sample drawn from only one 
ethnic group. Positional cloning based on whole-genome 
screens in complex human disease has proved more 
difficult than originally had been envisioned; detection of 
linkage and positional cloning of specific disease- 
susceptibility loci remains elusive [49, 50]. 

2.2. Assessment of Linkage Disequilibrium by the Decay 
of Haplotype Sharing, with Application to Fine-Scale 
Genetic Mapping 

Linkage disequilibrium (LD) is of great interest for gene 
mapping and the study of population history. We propose a 
multilocus model for LD, based on the decay of haplotype 
sharing (DHS). The DHS model is most appropriate when 
the LD in which one is interested is due to the introduction 
of a variant on an ancestral haplotype, with recombinations 
in succeeding generations resulting in preservation of only 
a small region of the ancestral haplotype around the variant. 
This is generally the scenario of interest for gene mapping 
by LD. The DHS parameter is a measure of LD that can be 
interpreted as the expected genetic distance to which the 
ancestral haplotype is preserved, or, equivalently, l/(time 
in generations to the ancestral haplotype). The method 
allows for multiple origins of alleles and for mutations, and 
it takes into account missing observations and ambiguities 
in haplotype determination, via a hidden Markov model. 
Whereas most commonly used measures of LD apply to 



pairs of loci, the DHS measure is designed for application 
to the densely mapped haplotype data that are increasingly 
available. The DHS method explicitly models the 
dependence among multiple tightly linked loci on a 
chromosome. When the assumptions about population 
structure are sufficiently tractable, the estimate of LD is 
obtained by maximum likelihood. For more-complicated 
models of population history, Mary Sara McPeek and 
Andrew Strahs (1999) found means and covariances based 
on the model and solve a quasi-score estimating equation. 
Simulations showed that this approach works extremely 
well both for estimation of LD and for fine mapping. They 
applied the DHS method to published data sets for cystic 
fibrosis and progressive myoclonus epilepsy [51, 52], 

2.3. Use of Gene-Specific Oligonucleotides 

This approach relies on the ability to isolate sufficient 
protein product to permit amino acid sequencing. Specific 
peptide bonds in the protein product can be cleaved using 
proteolytic enzymes such as trypsin (cuts at the carboxyl 
end of lysine or arginine residues) or reagents such as 
cyanogen bromide (cuts at the carboxyl end of methionine 
residues). The amino acid sequence of each resulting 
peptide can be determined by chemical sequencing. This 
involves a repeated series of chemical reactions in an 
automated amino acid sequencer. In each cycle, the peptide 
is exposed to a chemical that covalently bonds to the N- 
terminal amino acid and cleaves it off, allowing it to be 
identified by chromatography. Sequence overlaps identify 
overlapping peptides, enabling longer sequences to be 
assembled. 

The resulting amino acid sequence is inspected to 
identify regions containing amino acids with minimal 
codon degeneracy (e.g. methionines and tryptophans) are 
uniquely oligonucleotides low so as to increase the chance 
of identifying the correct target. Once a suitable cDNA 
clone is isolated, it can be used to screen a genomic DNA 
library in order to isolate genomic DNA clones for full 
characterization of the gene [53-55]. 

In most of the Western world, age-related macular 
degeneration (AMD) remains the largest single cause of 
severe visual impairment, and its prevalence continues to 
increase. It is considered to be a complex disease, in which 
multiple genes and environment play a role in pathogenesis. 
Several environmental insults are implicated with smoking, 
serum cholesterol, hypertension, sunlight exposure, and 
many other factors being variously associated with disease 
pathogenesis. Until recently, there have been relatively few 
breakthroughs to further our understanding of the genetics 
of AMD, despite remarkable progress in molecular genetic 
techniques over the last 20 years, and the fact that many 
rare inherited macular diseases have had their causative 
genes mapped. Development of new tools such as high- 
density single-nucleotide polymorphism chips and 
microarrays have changed the face of genetic research, but 
have yet to directly translate into improved clinical 
outcomes in ophthalmology. However with the recent 
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finding of the Tyr402His polymorphism in the complement 
factor H gene being implicated in AMD, scientists are 
about to witness a new wave of research in this disease. 
Not only does the identification of a biologically plausible 
gene identify a new pathway, but it also identifies new 
biological mechanisms for disease, avenues to pursue 
treatment, and a better understanding of how the 
environment interacts with the genetic background to create 
disease [55- 58]. 

Familial clustering of a disease is a direct indicator of a 
possible heritable cause, provided that environmental 
sharing can be excluded. If the familial clustering is lacking, 
the likelihood of a heritable influence is also small. In the 
era of genome scans, the consideration of data on 
heritability should be important in the assessment of the 
likely success of the genome scan [59, 60]. 

2.4. Examples 

2.4.1. Schizophrenia 

Family-based linkage disequilibrium mapping using SNP 
markers is expected to be a major route to the identification 
of susceptibility alleles for complex diseases. However 
there are a number of methodological issues yet to be 
resolved, including the handling of extended haplotype data 
and analysis of haplotype transmission in sib-pair or family 
trio samples. Javier Costas et al (2005) analysed two 
dinucleotide repeat and six SNP markers at the COMT 
locus at chromosome 22q 1 1 , a region implicated in 
psychosis, for transmission distortion in 198 Chinese 
schizophrenic family trios. When individual markers were 
analysed, Using haplotypes of paired markers by the 
program TRANSMIT, The global P value for the 
haplotypes of all six SNP markers tested and found, which 
may represent a background haplotype for the transmission 
of a schizophrenia susceptibility allele at chromosome 
22qll. Their results supported the hypotheses that either 
COMT is itself a susceptibility gene, or more likely that 
this region of chromosome 22 contains a susceptibility 
gene that is in linkage disequilibrium with COMT alleles 
[61-63], 

Detection of susceptibility genes in indirect association 
studies depends not only on the degree of linkage 
disequilibrium between the disease variant and the SNP 
marker but also on the difference in their allele frequencies. 
Little is known about how variations in these parameters 
may affect the power of indirect association studies among 
related populations. Therefore, these differences have to be 
an additional factor to consider when a replication study 
fails to confirm initial associations, especially if the 
replication is focused on very few markers [64, 65]. 

2.4.2. Fragile X 

An origin of bidirectional DNA replication was mapped 
to the promoter of the FMR1 gene in human chromosome 
Xq27.3, which has been linked to the fragile X syndrome. 
This origin is adjacent to a CpG island and overlaps the site 
of expansion of the triplet repeat (CGG) at the fragile X 



instability site, FRAXA. The promoter region of FMR2 in 
the FRAXE site (approximately 600 kb away, in 
chromosome band Xq28) also includes an origin of 
replication, as previously described [34] et al. FMR1 
transcripts were detected in foreskin and male fetal lung 
fibroblasts, while FMR2 transcripts were not. However, 
both FMR1 and FMR2 were found to replicate late in S 
phase (approximately 6 h into the S phase of normal human 
fibroblasts). The position of the origin of replication 
relative to the CGG repeat, and perhaps the late replication 
of these genes, might be important factors in the 
susceptibility to triplet repeat amplification at the FRAXA 
and FRAXE sites [66, 67]. 

3. Discussion 

3.1. Likelihood Analysis of Disequilibrium Mapping, and 
Related Problems 

The sampling distribution provides a basis for maximum- 
likelihood estimation of the recombination rate, the mutation 
rate, or the age of the allele, provided that the two other 
parameters are known. This theory is applied to [1], the data 
of Ha" stbacka et al., to estimate the recombination rate 
between a locus associated with diastrophic dysplasia and a 
linked RFLP marker; [49], the data of Risch et al., to 
estimate the age of a presumptive allele causing idiopathic 
distortion dystonia in Ashkenazi jews; and [39], the data of 
Tishkoff et al., to estimate the date at which, at the CD4 
locus, non- African lineages diverged from African lineages. 
Bruce Rannala and Montgomery Slatkin found that the 
extent of linkage disequilibrium can lead to relatively 
accurate estimates of recombination and mutation rates and 
that those estimates are not very sensitive to parameters, such 
as the population age, whose values are not known with 
certainty. In contrast, they also concluded that, in many cases, 
linkage disequilibrium may not lead to useful estimates of 
allele age, because of the relatively large degree of 
uncertainly in those estimates [68, 69]. 

3.2. Disequilibrium Likelihoods for Fine-Scale Mapping 
of a Rare Allele 

Genetic linkage studies based on pedigree data have 
limited resolution, because of the relatively small number 
of segregations. Disequilibrium mapping, which uses 
population associations to infer the location of a disease 
mutation, provides one possible strategy for narrowing the 
candidate region. The coalescent process provides a model 
for the ancestry of a sample of disease alleles, and 
recombination events between disease locus and marker 
may be placed on this ancestral phylogeny. These events 
define the recombinant classes, the sets of sampled disease 
copies descending from the meiosis at which a given 
recombination occurred Jinko Graham and Elizabeth A. 
Thompson showed (1998) that, how Monte Carlo 
generation of the recombinant classes leads to linkage 
likelihood for fine-scale mapping from disease haplotypes. 
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They compared single-marker disequilibrium mapping with 
interval-disequilibrium mapping and discuss how the 
approach may be extended to multipoint-disequilibrium 
mapping. The method and its properties are illustrated with 
an example of simulated data, constructed to be typical of 
fine-scale mapping of a rare disease in the Japanese 
population. The method can take into account known 
features of population history, such as changing patterns of 
population growth [70, 71]. 

4. Conclusions 

The debate over the average extent of levels of LD those 
are useful for association mapping is becoming narrower as 
data become available for more genomic regions and 
populations. Perhaps more importantly, it has become clear 
that the average extent might not be a good guide for the 
design and feasibility of LD mapping approaches. This is 
true for at least two reasons. First, the tremendous variability 
in the extent of LD from one region of the genome to another 
means that the average will greatly overstate the useful range 
for some regions and understate it for others. Second, even in 
a region of high mean LD, some pairs of loci do not show 
useful levels of LD due to gene conversion, differences in 
allele frequency and perhaps other factors. An important and 
almost entirely unanswered question is whether the patterns 
of LD found in one population will be replicated in other 
populations with differing population histories. What little 
data that can be applied to answering this question are 
conflicting, with hints that patterns of LD are similar among 
different populations equaled by indications that each 
population is substantially different. Answering this question, 
and establishing the generality (or not) of haplotype maps 
constructed in one population, should be an urgent priority 
for research. It is also worth briefly noting that the use of LD 
for mapping relies on assumptions regarding the genetic 
architecture of common diseases that are open to question; 
this point is discussed extensively elsewhere [72-74]. A less 
practical but perhaps more interesting question is what forces 
have shaped the patterns of LD in humans. An increasingly 
persuasive case can be made that simple demographic 
models of population expansions and contractions are 
insufficient to explain the observed patterns. More complex 
historical models might do better, but molecular forces, such 
as gene conversion [75] and recombination hot spots [76], 
have also recently come to the fore. Selection — positive, 
negative or balancing — must also have had an influence, 
but its role has been difficult to show conclusively. Sorting 
out these factors might occupy students of LD long after its 
more utilitarian uses have played themselves out. 
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